2.7. 结构体 - republic

在上一章中我们介绍了 C 结构类型。在本章中，我们将深入研究 C 结构体，检查静态和动态分配的结构体，并结合结构体和指针来创建更复杂的数据类型和数据结构。

我们首先快速概述静态声明的结构体。请参阅上一章了解更多详细信息。

2.7.1. C 结构类型回顾

结构体类型表示异构数据集合；它是一种将一组不同类型视为单个连贯单元的机制。

struct在 C 程序中定义和使用类型分为三个步骤：

定义一个struct定义字段值及其类型的类型。
声明struct类型的变量。
使用 点表示 法访问变量中的各个字段值。

在 C 中，结构体是 1.6. 结构体-左值 (它们可以出现在赋值语句的左侧)。struct变量的值是其内存的内容(构成其字段值的所有字节)。当调用带struct参数的函数时，struct参数的值(其所有字段的所有字节的副本)被复制到 struct函数参数(形参)。

使用结构体进行编程时，特别是组合结构体和数组时，仔细考虑每个表达式的类型至关重要。struct 中的每个字段代表一种特定类型，访问字段值的语法以及将各个字段值传递给函数的语义遵循其特定类型。

以下完整的示例程序演示了定义struct类型、声明该类型的变量、访问字段值以及将结构和单个字段值传递给函数。(为了可读性，我们省略了一些错误处理和注释)。

#include <stdio.h>
#include <string.h>

/* define a new struct type (outside function bodies) */
struct studentT {
    char  name[64];
    int   age;
    float gpa;
    int   grad_yr;
};

/* function prototypes */
int checkID(struct studentT s1, int min_age);
void changeName(char *old, char *new);

int main(void) {
    int can_vote;
    // declare variables of struct type:
    struct studentT student1, student2;

    // access field values using .
    strcpy(student1.name, "Ruth");
    student1.age = 17;
    student1.gpa = 3.5;
    student1.grad_yr = 2021;

    // structs are lvalues
    student2 = student1;
    strcpy(student2.name, "Frances");
    student2.age = student1.age + 4;

    // passing a struct
    can_vote = checkID(student1, 18);
    printf("%s %d\n", student1.name, can_vote);

    can_vote = checkID(student2, 18);
    printf("%s %d\n", student2.name, can_vote);

    // passing a struct field value
    changeName(student2.name, "Kwame");
    printf("student 2's name is now %s\n", student2.name);

    return 0;
}

int checkID(struct studentT s, int min_age) {
    int ret = 1;

    if (s.age < min_age) {
        ret = 0;
        // changes age field IN PARAMETER COPY ONLY
        s.age = min_age + 1;
    }
    return ret;
}

void changeName(char *old, char *new) {
    if ((old == NULL) || (new == NULL)) {
        return;
    }
    strcpy(old,new);
}

运行时，程序会产生：

Ruth 0
Frances 1
student 2's name is now Kwame

使用结构体时，考虑 struct 的类型及其字段尤为重要。例如，当将一个 struct 传递给函数时，参数将获取结构体值的副本(参数中所有字节的副本)。因此，对参数字段值的更改不会更改参数的值。前面程序中对 checkID 的调用中说明了此行为，该调用修改了参数的年龄字段。checkID 的更改对相应参数的年龄字段值没有影响。

将 struct 的字段传递给函数时，语义与字段的类型(函数参数的类型)匹配。例如，在对 changeName 的调用中，name 字段的值(结构体 student2 内部数组name的基地址)被复制到参数 old 中，这意味着形参(parameter: old)与内存中数组参数(argument:student2.name)引用相同的数组元素集。因此，更改函数中数组的元素也会更改参数中该元素的值；传递 name 字段的语义与 name 字段的类型相匹配。

2.7.2. 指针和结构体

就像其他 C 类型一样，程序员可以将变量声明为指向用户定义struct类型的指针。使用指针变量的语义struct类似于其他指针类型的语义，例如int *.

考虑struct studentT前面程序示例中引入的类型：

struct studentT {
    char  name[64];
    int   age;
    float gpa;
    int   grad_yr;
};

程序员可以声明类型 struct studentT或struct studentT *(指向 a 的指针struct studentT)的变量：

struct studentT s;
struct studentT *sptr;

// think very carefully about the type of each field when
// accessing it (name is an array of char, age is an int ...)
strcpy(s.name, "Freya");
s.age = 18;
s.gpa = 4.0;
s.grad_yr = 2020;

// malloc space for a struct studentT for sptr to point to:
sptr = malloc(sizeof(struct studentT));
if (sptr == NULL) {
    printf("Error: malloc failed\n");
    exit(1);
}

请注意调用 malloc 去初始化 sptr 指向堆内存中动态分配的结构。使用 sizeof 运算符来计算 malloc’s size request (e.g., `sizeof(struct studentT)) 确保 malloc 为结构中的所有字段值分配空间。

要访问指向一个 struct 的指针中的各个字段，首先需要取消引用(dereferenced)该指针变量。根据指针取消引用的规则，您可能会想访问struct如下字段：

// the grad_yr field of what sptr points to gets 2021:
(*sptr).grad_yr = 2021;

// the age field of what sptr points to gets s.age plus 1:
(*sptr).age = s.age + 1;

然而，由于指向结构体的指针非常常用，C 提供了一种特殊的运算符 ( →)，它可以取消引用一个 struct，同时访问其字段值。例如，sptr→year相当于(*sptr).year.以下是使用此表示法访问字段值的一些示例：

// the gpa field of what sptr points to gets 3.5:
sptr->gpa = 3.5;

// the name field of what sptr points to is a char *
// (can use strcpy to init its value):
strcpy(sptr->name, "Lars");

图 1概述了上述代码执行后变量s和 sptr 在内存中的样子。回想一下，malloc从堆中分配内存，而局部变量在堆栈上分配。

All the fields of struct s (Freya) are stored on the stack. The sptr pointer on the stack stores the heap address of another student struct (Lars).

图 1. 静态分配的结构(栈上的数据)和动态分配的结构(堆上的数据)之间内存布局的差异。

2.7.3. 结构体中的指针字段

结构体也可以定义为将指针类型作为字段值。例如：

struct personT {
    char *name;     // for a dynamically allocated string field
    int  age;
};

int main(void) {
    struct personT p1, *p2;

    // need to malloc space for the name field:
    p1.name = malloc(sizeof(char) * 8);
    strcpy(p1.name, "Zhichen");
    p1.age = 22;


    // first malloc space for the struct:
    p2 = malloc(sizeof(struct personT));

    // then malloc space for the name field:
    p2->name = malloc(sizeof(char) * 4);
    strcpy(p2->name, "Vic");
    p2->age = 19;
    ...

    // Note: for strings, we must allocate one extra byte to hold the
    // terminating null character that marks the end of the string.
}

在内存中，这些变量将如图2所示(注意哪些部分分配在堆栈上，哪些部分分配在堆上)。

Example struct with a pointer field type 图 2. 具有指针字段的结构在内存中的布局。

随着结构及其字段类型的复杂性增加，请注意它们的语法。要正确访问字段值，请从最外层的变量类型开始，并使用其类型语法来访问各个部分。例如，表 1struct中所示的变量类型决定了程序员应如何访问其字段。

表 1. 结构体字段访问示例

Expression	Type	Field Access Syntax
p1	struct personT	p1.age, p1.name
p2	struct personT *	p2->age, p2->name

此外，了解字段值的类型允许程序使用正确的语法来访问它们，如表 2中的示例所示。

表 2. 访问不同的结构体字段类型

Expression	Type	Example Access Syntax
p1.age	int	p1.age = 18;
p2->age	int	p2->age = 18;
p1.name	char *	printf("%s", p1.name);
p2->name	char *	printf("%s", p2->name);
p1.name[2]	char	p1.name[2] = 'a';
p2->name[2]	char	p2->name[2] = 'a';

在检查最后一个示例时，首先考虑最外层变量的类型(p2是指向 struct personT 的指针)。因此，要访问结构体中的字段值，程序员需要使用→语法 ( p2→name)。接下来，考虑 name字段的类型 char *，在该程序中用于指向char值数组。要通过name字段访问特定char 存储位置，请使用数组索引表示法：p2→name[2] = 'a'。

2.7.4.结构体数组

数组、指针和结构体可以组合起来创建更复杂的数据结构。以下是声明不同类型的结构数组变量的一些示例：

struct studentT classroom1[40];   // an array of 40 struct studentT

struct studentT *classroom2;      // a pointer to a struct studentT
                                  // (for a dynamically allocated array)

struct studentT *classroom3[40];  // an array of 40 struct studentT *
                                  // (each element stores a (struct studentT *)

同样，为了理解在程序中使用这些变量的语法和语义，必须仔细考虑变量和字段类型。以下是访问这些变量的正确语法的一些示例：

// classroom1 is an array:
//    use indexing to access a particular element
//    each element in classroom1 stores a struct studentT:
//    use dot notation to access fields
classroom1[3].age = 21;

// classroom2 is a pointer to a struct studentT
//    call malloc to dynamically allocate an array
//    of 15 studentT structs for it to point to:
classroom2 = malloc(sizeof(struct studentT) * 15);

// each element in array pointed to by classroom2 is a studentT struct
//    use [] notation to access an element of the array, and dot notation
//    to access a particular field value of the struct at that index:
classroom2[3].year = 2013;

// classroom3 is an array of struct studentT *
//    use [] notation to access a particular element
//    call malloc to dynamically allocate a struct for it to point to
classroom3[5] = malloc(sizeof(struct studentT));

// access fields of the struct using -> notation
// set the age field pointed to in element 5 of the classroom3 array to 21
classroom3[5]->age = 21;

采用类型数组struct studentT作为参数的函数可能如下所示：

void updateAges(struct studentT *classroom, int size) {
    int i;

    for (i = 0; i < size; i++) {
        classroom[i].age += 1;
    }
}

程序可以向此函数传递静态或动态分配的数组struct studentT：

updateAges(classroom1, 40);
updateAges(classroom2, 15);

classroom1传递( 或classroom2) 给 updateAges 的语义与将静态声明(或动态分配)数组传递给函数的语义相匹配：形参(parameter)与实参(argument)引用相同的元素集，因此函数内数组值的更改会影响实参数组的元素。

图 3显示了第二次调用该函数时堆栈的样子updateAges(显示了传递的classroom2数组，其中每个元素中都有结构体的示例字段值)。

Main’s classroom2 variable points to an array of studentT structs on the heap. When classroom2 gets passed to updateAges, it makes a copy of the pointer, yielding another pointer that points to the same heap array.

图 3. 传递给函数的 struct StudentT 数组的内存布局。

与往常一样，参数获取其参数值的副本(堆内存中数组的内存地址)。因此，修改函数中数组的元素将保留其参数的值(形参(parameter)和实参(argument)都引用内存中的同一数组)。

该updateAges函数无法传递classroom3数组，因为它的类型与参数的类型不同：classroom3是 struct studentT * 的数组，而不是 struct studentT 的数组。

2.7.5. 自指结构(数据结构)

一个结构体可以定义一个指向与自己类型相同的 struct 字段。这些自引用(self-referential)的 struct 类型可用于构建数据结构的链接实现，例如链表、树和图。

这些数据类型及其链接实现的详细信息超出了本书的范围。然而，我们简要展示了一个如何在 C 中定义和使用自引用struct类型来创建链表的示例。有关链表的更多信息，请参阅数据结构和算法教科书。

链表是实现列表抽象数据类型的一种方法。列表表示按元素在列表中的位置排序的元素序列。在 C 语言中，列表数据结构可以实现为数组或链表，使用自引用 struct 类型来存储列表中的各个节点。

为了构建后者，程序员将定义一个 node 结构体来包含一个列表元素和到列表中下一个节点的链接。下面是一个可以存储整数值链接列表的示例：

struct node {
    int data;           // used to store a list element's data value
    struct node *next;  // used to point to the next node in the list
};

这种类型的实例struct可以通过 next字段链接在一起以创建链表。

head 此示例代码片段创建一个包含三个元素的链表(该列表本身由指向列表中第一个节点的变量引用)：

struct node *head, *temp;
int i;

head = NULL;  // an empty linked list

head = malloc(sizeof(struct node));  // allocate a node
if (head == NULL) {
    printf("Error malloc\n");
    exit(1);
}
head->data = 10;    // set the data field
head->next = NULL;  // set next to NULL (there is no next element)

// add 2 more nodes to the head of the list:
for (i = 0; i < 2; i++) {
    temp = malloc(sizeof(struct node));  // allocate a node
    if (temp == NULL) {
        printf("Error malloc\n");
        exit(1);
    }
    temp->data = i;     // set data field
    temp->next = head;  // set next to point to current first node
    head = temp;        // change head to point to newly added node
}

请注意，该temp变量临时指向一个被初始化的 malloc 的 node 节点，然后通过将其 next 字段设置为指向当前 head 指向的节点(头结点)，然后将 head 更改为指向这个新节点(temp指向的节点)，将其添加到列表的开头。

执行此代码的结果在内存中类似于图 4。

Two stack variables, head and temp, contain the address of the first node on the heap. The first node’s next field points to the second node, whose next field points to the third. The third node’s next pointer is null, indicating the end of the list.

图 4. 三个示例链表节点在内存中的布局。